Overview

Dataset statistics

Number of variables20
Number of observations327346
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory136.0 MiB
Average record size in memory435.8 B

Variable types

NUM14
CAT6

Reproduction

Analysis started2020-04-20 02:20:17.871395
Analysis finished2020-04-20 02:51:09.057741
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
tailnum has a high cardinality: 4037 distinct values High cardinality
dest has a high cardinality: 104 distinct values High cardinality
time_hour has a high cardinality: 6922 distinct values High cardinality
sched_dep_time is highly correlated with dep_time and 1 other fieldsHigh Correlation
dep_time is highly correlated with sched_dep_time and 1 other fieldsHigh Correlation
arr_delay is highly correlated with dep_delayHigh Correlation
dep_delay is highly correlated with arr_delayHigh Correlation
distance is highly correlated with air_timeHigh Correlation
air_time is highly correlated with distanceHigh Correlation
hour is highly correlated with dep_time and 1 other fieldsHigh Correlation
time_hour only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
dep_delay has 16466 (5.0%) zeros Zeros
arr_delay has 5409 (1.7%) zeros Zeros
minute has 58924 (18.0%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE
Distinct count327346
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean168190.6378
Minimum0
Maximum336769
Zeros1
Zeros (%)< 0.1%
Memory size2.5 MiB

Quantile statistics

Minimum0
5-th percentile16581.25
Q183007.25
median168251.5
Q3252782.75
95-th percentile320245.75
Maximum336769
Range336769
Interquartile range (IQR)169775.5

Descriptive statistics

Standard deviation97510.31438
Coefficient of variation (CV)0.5797606553
Kurtosis-1.205125435
Mean168190.6378
Median Absolute Deviation (MAD)84458.5635
Skewness0.002508376679
Sum5.505653252e+10
Variance9508261411
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 13955.5 14003.5 21823.5 21860.5 ... 319156.5 319184.5 319982.5 320176.5 336769. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
166660 1 < 0.1%
 
328437 1 < 0.1%
 
334582 1 < 0.1%
 
332535 1 < 0.1%
 
174848 1 < 0.1%
 
172801 1 < 0.1%
 
178946 1 < 0.1%
 
176899 1 < 0.1%
 
164613 1 < 0.1%
 
Other values (327336) 327336 > 99.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
ValueCountFrequency (%) 
336769 1 < 0.1%
 
336768 1 < 0.1%
 
336767 1 < 0.1%
 
336766 1 < 0.1%
 
336765 1 < 0.1%
 

year
Categorical

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
2013
327346
ValueCountFrequency (%) 
2013 327346 100.0%
 

Length

Max length4
Mean length4
Min length4
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

month
Real number (ℝ≥0)

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.564802991
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.413444381
Coefficient of variation (CV)0.5199614346
Kurtosis-1.188328477
Mean6.564802991
Median Absolute Deviation (MAD)2.95801638
Skewness-0.02362709289
Sum2148962
Variance11.65160254
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 5.5 6.5 8.5 9.5 10.5 11.5 12. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8 28756 8.8%
 
10 28618 8.7%
 
7 28293 8.6%
 
5 28128 8.6%
 
3 27902 8.5%
 
4 27564 8.4%
 
6 27075 8.3%
 
12 27020 8.3%
 
9 27010 8.3%
 
11 26971 8.2%
 
Other values (2) 50009 15.3%
 
ValueCountFrequency (%) 
1 26398 8.1%
 
2 23611 7.2%
 
3 27902 8.5%
 
4 27564 8.4%
 
5 28128 8.6%
 
ValueCountFrequency (%) 
12 27020 8.3%
 
11 26971 8.2%
 
10 28618 8.7%
 
9 27010 8.3%
 
8 28756 8.8%
 

day
Real number (ℝ≥0)

Distinct count31
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.74082469
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.777376041
Coefficient of variation (CV)0.5576185627
Kurtosis-1.185600327
Mean15.74082469
Median Absolute Deviation (MAD)7.58678047
Skewness-0.001125534436
Sum5152696
Variance77.04233016
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 4.5 7.5 ... 22.5 27.5 28.5 30.5 31. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
15 11150 3.4%
 
18 11131 3.4%
 
3 11070 3.4%
 
21 11017 3.4%
 
22 10985 3.4%
 
11 10983 3.4%
 
20 10974 3.4%
 
17 10961 3.3%
 
4 10949 3.3%
 
27 10845 3.3%
 
Other values (21) 217281 66.4%
 
ValueCountFrequency (%) 
1 10748 3.3%
 
2 10524 3.2%
 
3 11070 3.4%
 
4 10949 3.3%
 
5 10609 3.2%
 
ValueCountFrequency (%) 
31 6038 1.8%
 
30 10023 3.1%
 
29 9916 3.0%
 
28 10394 3.2%
 
27 10845 3.3%
 

dep_time
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1317
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1348.789883
Minimum1
Maximum2400
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile624
Q1907
median1400
Q31744
95-th percentile2112
Maximum2400
Range2399
Interquartile range (IQR)837

Descriptive statistics

Standard deviation488.3199792
Coefficient of variation (CV)0.3620430324
Kurtosis-1.089029272
Mean1348.789883
Median Absolute Deviation (MAD)423.7592117
Skewness-0.02340292549
Sum441520973
Variance238456.4021
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.0000e+00 1.3500e+01 3.2500e+01 5.8500e+01 1.0050e+02 ... 2.3345e+03 2.3465e+03 2.3515e+03 2.3585e+03 2.4000e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
555 833 0.3%
 
556 817 0.2%
 
755 816 0.2%
 
557 798 0.2%
 
655 793 0.2%
 
1455 767 0.2%
 
1454 766 0.2%
 
654 745 0.2%
 
855 740 0.2%
 
756 739 0.2%
 
Other values (1307) 319532 97.6%
 
ValueCountFrequency (%) 
1 25 < 0.1%
 
2 35 < 0.1%
 
3 26 < 0.1%
 
4 26 < 0.1%
 
5 20 < 0.1%
 
ValueCountFrequency (%) 
2400 29 < 0.1%
 
2359 54 < 0.1%
 
2358 76 < 0.1%
 
2357 74 < 0.1%
 
2356 74 < 0.1%
 

sched_dep_time
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1020
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1340.335098
Minimum500
Maximum2359
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum500
5-th percentile630
Q1905
median1355
Q31729
95-th percentile2050
Maximum2359
Range1859
Interquartile range (IQR)824

Descriptive statistics

Standard deviation467.4131564
Coefficient of variation (CV)0.3487285807
Kurtosis-1.198520237
Mean1340.335098
Median Absolute Deviation (MAD)407.089483
Skewness0.006235546313
Sum438753333
Variance218475.0588
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 500. 500.5 512.5 515.5 518.5 ... 2310. 2348.5 2353.5 2358.5 2359. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
600 6836 2.1%
 
700 4822 1.5%
 
630 4690 1.4%
 
900 4666 1.4%
 
1200 4521 1.4%
 
1700 4380 1.3%
 
1600 3971 1.2%
 
800 3862 1.2%
 
1300 3573 1.1%
 
1900 3544 1.1%
 
Other values (1010) 282481 86.3%
 
ValueCountFrequency (%) 
500 340 0.1%
 
501 1 < 0.1%
 
505 2 < 0.1%
 
510 5 < 0.1%
 
515 205 0.1%
 
ValueCountFrequency (%) 
2359 810 0.2%
 
2358 44 < 0.1%
 
2355 73 < 0.1%
 
2352 16 < 0.1%
 
2345 1 < 0.1%
 

dep_delay
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count526
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.55515571
Minimum-43
Maximum1301
Zeros16466
Zeros (%)5.0%
Memory size2.5 MiB

Quantile statistics

Minimum-43
5-th percentile-9
Q1-5
median-2
Q311
95-th percentile88
Maximum1301
Range1344
Interquartile range (IQR)16

Descriptive statistics

Standard deviation40.06568759
Coefficient of variation (CV)3.19117409
Kurtosis44.35504307
Mean12.55515571
Median Absolute Deviation (MAD)23.07396344
Skewness4.818017945
Sum4109880
Variance1605.259322
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ -43. -24.5 -20.5 -18.5 -16.5 ... 392.5 438.5 512. 905. 1301. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-5 24765 7.6%
 
-4 24557 7.5%
 
-3 24158 7.4%
 
-2 21463 6.6%
 
-6 20649 6.3%
 
-1 18761 5.7%
 
-7 16714 5.1%
 
0 16466 5.0%
 
-8 11770 3.6%
 
1 8026 2.5%
 
Other values (516) 140017 42.8%
 
ValueCountFrequency (%) 
-43 1 < 0.1%
 
-33 1 < 0.1%
 
-32 1 < 0.1%
 
-30 1 < 0.1%
 
-27 1 < 0.1%
 
ValueCountFrequency (%) 
1301 1 < 0.1%
 
1137 1 < 0.1%
 
1126 1 < 0.1%
 
1014 1 < 0.1%
 
1005 1 < 0.1%
 

arr_time
Real number (ℝ≥0)

Distinct count1410
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1501.908238
Minimum1
Maximum2400
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile736
Q11104
median1535
Q31940
95-th percentile2248
Maximum2400
Range2399
Interquartile range (IQR)836

Descriptive statistics

Standard deviation532.8887311
Coefficient of variation (CV)0.3548077823
Kurtosis-0.1946780458
Mean1501.908238
Median Absolute Deviation (MAD)447.5545315
Skewness-0.4656925901
Sum491643654
Variance283970.3997
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.0000e+00 1.5000e+00 5.5000e+00 1.8500e+01 3.3500e+01 ... 2.2585e+03 2.3005e+03 2.3095e+03 2.3585e+03 2.4000e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1008 484 0.1%
 
1013 484 0.1%
 
1015 479 0.1%
 
1012 464 0.1%
 
1005 460 0.1%
 
1016 459 0.1%
 
1006 459 0.1%
 
1011 457 0.1%
 
1007 456 0.1%
 
1040 455 0.1%
 
Other values (1400) 322689 98.6%
 
ValueCountFrequency (%) 
1 201 0.1%
 
2 163 < 0.1%
 
3 174 0.1%
 
4 172 0.1%
 
5 205 0.1%
 
ValueCountFrequency (%) 
2400 150 < 0.1%
 
2359 221 0.1%
 
2358 187 0.1%
 
2357 207 0.1%
 
2356 201 0.1%
 

sched_arr_time
Real number (ℝ≥0)

Distinct count1162
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1532.788426
Minimum1
Maximum2359
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile815
Q11122
median1554
Q31944
95-th percentile2246
Maximum2359
Range2358
Interquartile range (IQR)822

Descriptive statistics

Standard deviation497.9791245
Coefficient of variation (CV)0.3248844499
Kurtosis-0.384548899
Mean1532.788426
Median Absolute Deviation (MAD)423.8191843
Skewness-0.3444786467
Sum501752160
Variance247983.2084
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.0000e+00 1.5000e+00 2.5000e+00 3.5000e+00 5.5000e+00 ... 2.3525e+03 2.3535e+03 2.3575e+03 2.3585e+03 2.3590e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1025 1294 0.4%
 
2015 1201 0.4%
 
1110 1191 0.4%
 
1115 1163 0.4%
 
1235 1119 0.3%
 
2359 1091 0.3%
 
1815 1064 0.3%
 
1015 1057 0.3%
 
1220 1056 0.3%
 
1310 1047 0.3%
 
Other values (1152) 316063 96.6%
 
ValueCountFrequency (%) 
1 235 0.1%
 
2 92 < 0.1%
 
3 158 < 0.1%
 
4 103 < 0.1%
 
5 82 < 0.1%
 
ValueCountFrequency (%) 
2359 1091 0.3%
 
2358 481 0.1%
 
2357 345 0.1%
 
2356 460 0.1%
 
2355 329 0.1%
 

arr_delay
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count577
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.895376757
Minimum-86
Maximum1272
Zeros5409
Zeros (%)1.7%
Memory size2.5 MiB

Quantile statistics

Minimum-86
5-th percentile-32
Q1-17
median-5
Q314
95-th percentile91
Maximum1272
Range1358
Interquartile range (IQR)31

Descriptive statistics

Standard deviation44.63329169
Coefficient of variation (CV)6.472930089
Kurtosis29.233044
Mean6.895376757
Median Absolute Deviation (MAD)27.76627155
Skewness3.71681748
Sum2257174
Variance1992.130727
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ -86. -70.5 -66.5 -62.5 -57.5 ... 386.5 423. 498. 923. 1272. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-13 7177 2.2%
 
-10 7088 2.2%
 
-12 7046 2.2%
 
-14 6975 2.1%
 
-11 6863 2.1%
 
-9 6815 2.1%
 
-15 6796 2.1%
 
-7 6677 2.0%
 
-17 6668 2.0%
 
-8 6663 2.0%
 
Other values (567) 258578 79.0%
 
ValueCountFrequency (%) 
-86 1 < 0.1%
 
-79 1 < 0.1%
 
-75 2 < 0.1%
 
-74 1 < 0.1%
 
-73 1 < 0.1%
 
ValueCountFrequency (%) 
1272 1 < 0.1%
 
1127 1 < 0.1%
 
1109 1 < 0.1%
 
1007 1 < 0.1%
 
989 1 < 0.1%
 

carrier
Categorical

Distinct count16
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
UA
57782
B6
54049
EV
51108
DL
47658
AA
31947
Other values (11)
84802
ValueCountFrequency (%) 
UA 57782 17.7%
 
B6 54049 16.5%
 
EV 51108 15.6%
 
DL 47658 14.6%
 
AA 31947 9.8%
 
MQ 25037 7.6%
 
US 19831 6.1%
 
9E 17294 5.3%
 
WN 12044 3.7%
 
VX 5116 1.6%
 
Other values (6) 5480 1.7%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 17 89.5%
 
Decimal_Number 2 10.5%
 
ValueCountFrequency (%) 
Latin 17 89.5%
 
Common 2 10.5%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 

flight
Real number (ℝ≥0)

Distinct count3835
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1943.104501
Minimum1
Maximum8500
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum1
5-th percentile87
Q1544
median1467
Q33412
95-th percentile4689
Maximum8500
Range8499
Interquartile range (IQR)2868

Descriptive statistics

Standard deviation1621.523684
Coefficient of variation (CV)0.8345015324
Kurtosis-0.7907590658
Mean1943.104501
Median Absolute Deviation (MAD)1377.983526
Skewness0.6930968219
Sum636067486
Variance2629339.057
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.0000e+00 1.5000e+00 2.5000e+00 3.5000e+00 5.5000e+00 ... 6.1725e+03 6.1785e+03 6.1805e+03 7.3405e+03 8.5000e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
15 956 0.3%
 
27 886 0.3%
 
181 875 0.3%
 
301 852 0.3%
 
161 780 0.2%
 
695 756 0.2%
 
1109 709 0.2%
 
745 697 0.2%
 
1 697 0.2%
 
359 694 0.2%
 
Other values (3825) 319444 97.6%
 
ValueCountFrequency (%) 
1 697 0.2%
 
2 51 < 0.1%
 
3 628 0.2%
 
4 391 0.1%
 
5 324 0.1%
 
ValueCountFrequency (%) 
8500 1 < 0.1%
 
6181 80 < 0.1%
 
6180 6 < 0.1%
 
6177 160 < 0.1%
 
6168 2 < 0.1%
 

tailnum
Categorical

HIGH CARDINALITY
Distinct count4037
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
N725MQ
 
544
N722MQ
 
485
N723MQ
 
475
N711MQ
 
462
N713MQ
 
449
Other values (4032)
324931
ValueCountFrequency (%) 
N725MQ 544 0.2%
 
N722MQ 485 0.1%
 
N723MQ 475 0.1%
 
N711MQ 462 0.1%
 
N713MQ 449 0.1%
 
N258JB 420 0.1%
 
N353JB 403 0.1%
 
N298JB 402 0.1%
 
N351JB 391 0.1%
 
N328AA 389 0.1%
 
Other values (4027) 322926 98.6%
 

Length

Max length6
Mean length5.995179413
Min length5
ValueCountFrequency (%) 
Uppercase_Letter 24 70.6%
 
Decimal_Number 10 29.4%
 
ValueCountFrequency (%) 
Latin 24 70.6%
 
Common 10 29.4%
 
ValueCountFrequency (%) 
ASCII 34 100.0%
 

origin
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
EWR
117127
JFK
109079
LGA
101140
ValueCountFrequency (%) 
EWR 117127 35.8%
 
JFK 109079 33.3%
 
LGA 101140 30.9%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 9 100.0%
 
ValueCountFrequency (%) 
Latin 9 100.0%
 
ValueCountFrequency (%) 
ASCII 9 100.0%
 

dest
Categorical

HIGH CARDINALITY
Distinct count104
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
ATL
 
16837
ORD
 
16566
LAX
 
16026
BOS
 
15022
MCO
 
13967
Other values (99)
248928
ValueCountFrequency (%) 
ATL 16837 5.1%
 
ORD 16566 5.1%
 
LAX 16026 4.9%
 
BOS 15022 4.6%
 
MCO 13967 4.3%
 
CLT 13674 4.2%
 
SFO 13173 4.0%
 
FLL 11897 3.6%
 
MIA 11593 3.5%
 
DCA 9111 2.8%
 
Other values (94) 189480 57.9%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 26 100.0%
 
ValueCountFrequency (%) 
Latin 26 100.0%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

air_time
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count509
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean150.6864602
Minimum20
Maximum695
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum20
5-th percentile40
Q182
median129
Q3192
95-th percentile339
Maximum695
Range675
Interquartile range (IQR)110

Descriptive statistics

Standard deviation93.68830466
Coefficient of variation (CV)0.6217433506
Kurtosis0.8630769908
Mean150.6864602
Median Absolute Deviation (MAD)72.7175711
Skewness1.070705186
Sum49326610
Variance8777.49843
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 20. 21.5 22.5 25.5 28.5 ... 597.5 622.5 640.5 660.5 695. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
42 2552 0.8%
 
43 2543 0.8%
 
41 2513 0.8%
 
45 2495 0.8%
 
40 2466 0.8%
 
44 2444 0.7%
 
39 2411 0.7%
 
47 2409 0.7%
 
46 2406 0.7%
 
109 2377 0.7%
 
Other values (499) 302730 92.5%
 
ValueCountFrequency (%) 
20 2 < 0.1%
 
21 14 < 0.1%
 
22 34 < 0.1%
 
23 82 < 0.1%
 
24 103 < 0.1%
 
ValueCountFrequency (%) 
695 1 < 0.1%
 
691 1 < 0.1%
 
686 2 < 0.1%
 
683 1 < 0.1%
 
679 1 < 0.1%
 

distance
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count213
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1048.371314
Minimum80
Maximum4983
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum80
5-th percentile199
Q1509
median888
Q31389
95-th percentile2475
Maximum4983
Range4903
Interquartile range (IQR)880

Descriptive statistics

Standard deviation735.9085231
Coefficient of variation (CV)0.7019540821
Kurtosis1.14911845
Mean1048.371314
Median Absolute Deviation (MAD)568.1345654
Skewness1.113392621
Sum343180156
Variance541561.3544
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 80. 87. 95. 106. 151.5 ... 2581. 2978. 4166.5 4973. 4983. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2475 11159 3.4%
 
762 10041 3.1%
 
733 8507 2.6%
 
2586 8109 2.5%
 
544 5961 1.8%
 
719 5828 1.8%
 
187 5773 1.8%
 
1096 5702 1.7%
 
2454 5646 1.7%
 
944 5429 1.7%
 
Other values (203) 255191 78.0%
 
ValueCountFrequency (%) 
80 48 < 0.1%
 
94 895 0.3%
 
96 598 0.2%
 
116 412 0.1%
 
143 418 0.1%
 
ValueCountFrequency (%) 
4983 342 0.1%
 
4963 359 0.1%
 
3370 8 < 0.1%
 
2586 8109 2.5%
 
2576 309 0.1%
 

hour
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count19
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.14100982
Minimum5
Maximum23
Zeros0
Zeros (%)0.0%
Memory size2.5 MiB

Quantile statistics

Minimum5
5-th percentile6
Q19
median13
Q317
95-th percentile20
Maximum23
Range18
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.662062914
Coefficient of variation (CV)0.354772044
Kurtosis-1.206908044
Mean13.14100982
Median Absolute Deviation (MAD)4.053830471
Skewness0.01154287952
Sum4301657
Variance21.73483062
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 5. 5.5 6.5 7.5 8.5 ... 19.5 20.5 21.5 22.5 23. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8 26734 8.2%
 
6 25447 7.8%
 
17 23667 7.2%
 
15 23082 7.1%
 
7 22475 6.9%
 
16 22045 6.7%
 
18 21072 6.4%
 
14 21022 6.4%
 
19 20507 6.3%
 
9 19931 6.1%
 
Other values (9) 101364 31.0%
 
ValueCountFrequency (%) 
5 1940 0.6%
 
6 25447 7.8%
 
7 22475 6.9%
 
8 26734 8.2%
 
9 19931 6.1%
 
ValueCountFrequency (%) 
23 1042 0.3%
 
22 2558 0.8%
 
21 10503 3.2%
 
20 16061 4.9%
 
19 20507 6.3%
 

minute
Real number (ℝ≥0)

ZEROS
Distinct count60
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.2341162
Minimum0
Maximum59
Zeros58924
Zeros (%)18.0%
Memory size2.5 MiB

Quantile statistics

Minimum0
5-th percentile0
Q18
median29
Q344
95-th percentile58
Maximum59
Range59
Interquartile range (IQR)36

Descriptive statistics

Standard deviation19.29591774
Coefficient of variation (CV)0.7355276464
Kurtosis-1.234587472
Mean26.2341162
Median Absolute Deviation (MAD)16.60165774
Skewness0.09257147917
Sum8587633
Variance372.3324414
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 4.5 ... 55.5 56.5 57.5 58.5 59. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 58924 18.0%
 
30 33033 10.1%
 
45 19871 6.1%
 
15 18365 5.6%
 
55 18290 5.6%
 
59 15817 4.8%
 
10 14135 4.3%
 
25 14030 4.3%
 
5 13690 4.2%
 
29 13453 4.1%
 
Other values (50) 107738 32.9%
 
ValueCountFrequency (%) 
0 58924 18.0%
 
1 2085 0.6%
 
2 818 0.2%
 
3 1381 0.4%
 
4 1322 0.4%
 
ValueCountFrequency (%) 
59 15817 4.8%
 
58 1038 0.3%
 
57 1335 0.4%
 
56 1665 0.5%
 
55 18290 5.6%
 

time_hour
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count6922
Unique (%)2.1%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
20-09-2013 08:00
 
94
23-09-2013 08:00
 
93
16-09-2013 08:00
 
92
09-09-2013 08:00
 
92
19-09-2013 08:00
 
92
Other values (6917)
326883
ValueCountFrequency (%) 
20-09-2013 08:00 94 < 0.1%
 
23-09-2013 08:00 93 < 0.1%
 
16-09-2013 08:00 92 < 0.1%
 
09-09-2013 08:00 92 < 0.1%
 
19-09-2013 08:00 92 < 0.1%
 
10-09-2013 08:00 91 < 0.1%
 
23-10-2013 08:00 91 < 0.1%
 
09-10-2013 08:00 91 < 0.1%
 
18-09-2013 08:00 91 < 0.1%
 
21-10-2013 08:00 90 < 0.1%
 
Other values (6912) 326429 99.7%
 

Length

Max length16
Mean length16
Min length16
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Other_Punctuation 1 7.7%
 
Dash_Punctuation 1 7.7%
 
Space_Separator 1 7.7%
 
ValueCountFrequency (%) 
Common 13 100.0%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexyearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
00201311517.05152.0830.081911.0UA1545N14228EWRIAH227.0140051501-01-2013 05:00
11201311533.05294.0850.083020.0UA1714N24211LGAIAH227.0141652901-01-2013 05:00
22201311542.05402.0923.085033.0AA1141N619AAJFKMIA160.0108954001-01-2013 05:00
33201311544.0545-1.01004.01022-18.0B6725N804JBJFKBQN183.0157654501-01-2013 05:00
44201311554.0600-6.0812.0837-25.0DL461N668DNLGAATL116.07626001-01-2013 06:00
55201311554.0558-4.0740.072812.0UA1696N39463EWRORD150.071955801-01-2013 05:00
66201311555.0600-5.0913.085419.0B6507N516JBEWRFLL158.010656001-01-2013 06:00
77201311557.0600-3.0709.0723-14.0EV5708N829ASLGAIAD53.02296001-01-2013 06:00
88201311557.0600-3.0838.0846-8.0B679N593JBJFKMCO140.09446001-01-2013 06:00
99201311558.0600-2.0753.07458.0AA301N3ALAALGAORD138.07336001-01-2013 06:00

Last rows

df_indexyearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
32733633676020139302211.0205972.02339.0224257.0EV4672N12145EWRSTL120.0872205930-09-2013 20:00
32733733676120139302231.02245-14.02335.02356-21.0B6108N193JBJFKPWM48.0273224530-09-2013 22:00
32733833676220139302233.0211380.0112.03042.0UA471N578UAEWRSFO318.02565211330-09-2013 21:00
32733933676320139302235.02001154.059.02249130.0B61083N804JBJFKMCO123.094420130-09-2013 20:00
32734033676420139302237.02245-8.02345.02353-8.0B6234N318JBJFKBTV43.0266224530-09-2013 22:00
32734133676520139302240.02245-5.02334.02351-17.0B61816N354JBJFKSYR41.0209224530-09-2013 22:00
32734233676620139302240.02250-10.02347.07-20.0B62002N281JBJFKBUF52.0301225030-09-2013 22:00
32734333676720139302241.02246-5.02345.01-16.0B6486N346JBJFKROC47.0264224630-09-2013 22:00
32734433676820139302307.0225512.02359.023581.0B6718N565JBJFKBOS33.0187225530-09-2013 22:00
32734533676920139302349.02359-10.0325.0350-25.0B6745N516JBJFKPSE196.01617235930-09-2013 23:00